A stochastic bandit algorithm for scratch games
نویسندگان
چکیده
Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization).
منابع مشابه
Robust Learning for Repeated Stochastic Games via Meta-Gaming
This paper addresses learning in repeated stochastic games (RSGs) played against unknown associates. Learning in RSGs is extremely challenging due to their inherently large strategy spaces. Furthermore, these games typically have multiple (often infinite) equilibria, making attempts to solve them via equilibrium analysis and rationality assumptions wholly insufficient. As such, previous learnin...
متن کاملIndividual Q-Learning in Normal Form Games
The single-agent multi-armed bandit problem can be solved by an agent that learns the values of each action using reinforcement learning. However, the multi-agent version of the problem, the iterated normal form game, presents a more complex challenge, since the rewards available to each agent depend on the strategies of the others. We consider the behavior of valuebased learning agents in this...
متن کاملMinimax Policies for Bandits Games
This work deals with four classical prediction games, namely full information, bandit and label efficient (full information or bandit) games as well as three different notions of regret: pseudo-regret, expected regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a unified analysis...
متن کاملImproved Algorithms for Linear Stochastic Bandits
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm f...
متن کاملGap-free Bounds for Stochastic Multi-Armed Bandit
We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bo...
متن کامل